In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [2]:
train = pd.read_csv(r'C:\Users\hrao\Documents\Personal\HK\Python\train.csv')
test = pd.read_csv(r'C:\Users\hrao\Documents\Personal\HK\Python\test.csv')
The plot shows that the number of female survivors were significantly more than the male survivors. There were more survivors overall in first class than in any other class.
There were also less survivors overall in third class than in any other class.
Male survivors were twice in first class than in second or third class. Female survivors in first class were twice that of third class.
In [12]:
sns.barplot(x='Pclass',y='Survived',data=train, hue='Sex')
Out[12]:
The plot explains the above facts in a different representation.
In [13]:
sns.barplot(x='Sex',y='Survived',data=train, hue='Pclass')
Out[13]:
The plot explains the distribution of survivors across age and class. More red on the lower part of the left swarm indicates that younger passengers in the third class had the least chance to survive.
More blue spots on the top part of the right swarm meant that elderly passengers from the first class had the best chance to survive.
Distribution of blue spots on the right swarm is uniform - indicating that, irrespective of age, the first class had better chances of survival.
In [30]:
sns.swarmplot(x='Survived',y='Age',hue='Pclass',data=train)
Out[30]:
The plot shows that male passengers had the least chance of survival and female passengers had the best chance of survival.
In [33]:
sns.swarmplot(x='Survived',y='Age',hue='Sex',data=train)
Out[33]:
Same data with a different representation.
In [38]:
sns.swarmplot(x='Sex',y='Age',data=train)
Out[38]:
Plot showing distribution of fares among classes of travel. A first class ticket is about 4 times a second class ticket.
A third class ticket costs about 3/4 a second class ticket.
In [36]:
sns.pointplot(x='Pclass',y='Fare',data=train)
Out[36]:
The plot shows differences in fares based on the point of embarkation.
Fares from Cherbourg were the highest, in fact costing about twice as the fares from Southampton and about three times as the fares from Queenstown.
Fares from Southampton costed twice that of Queenstown.
C = Cherbourg, Q = Queenstown, S = Southampton
In [48]:
sns.barplot(x='Embarked',y='Fare',data=train)
Out[48]: